Overview

Dataset statistics

Number of variables9
Number of observations637
Missing cells278
Missing cells (%)4.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory44.9 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Alerts

Outcome has constant value "0.0" Constant
Pregnancies is highly correlated with AgeHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with Insulin and 1 other fieldsHigh correlation
Insulin is highly correlated with SkinThicknessHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies has 18 (2.8%) missing values Missing
Glucose has 10 (1.6%) missing values Missing
SkinThickness has 12 (1.9%) missing values Missing
BMI has 8 (1.3%) missing values Missing
DiabetesPedigreeFunction has 12 (1.9%) missing values Missing
Age has 15 (2.4%) missing values Missing
Outcome has 203 (31.9%) missing values Missing
Pregnancies has 90 (14.1%) zeros Zeros

Reproduction

Analysis started2022-09-19 19:21:02.989096
Analysis finished2022-09-19 19:21:19.016608
Duration16.03 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct12
Distinct (%)1.9%
Missing18
Missing (%)2.8%
Infinite0
Infinite (%)0.0%
Mean3.626817447
Minimum0
Maximum11
Zeros90
Zeros (%)14.1%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:19.178847image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum11
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.028664255
Coefficient of variation (CV)0.8350749104
Kurtosis-0.5790743989
Mean3.626817447
Median Absolute Deviation (MAD)2
Skewness0.6754738387
Sum2245
Variance9.172807169
MonotonicityNot monotonic
2022-09-20T00:51:19.341103image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1108
17.0%
090
14.1%
287
13.7%
362
9.7%
458
9.1%
548
7.5%
643
 
6.8%
737
 
5.8%
828
 
4.4%
926
 
4.1%
Other values (2)32
 
5.0%
ValueCountFrequency (%)
090
14.1%
1108
17.0%
287
13.7%
362
9.7%
458
9.1%
548
7.5%
643
 
6.8%
737
 
5.8%
828
 
4.4%
926
 
4.1%
ValueCountFrequency (%)
1110
 
1.6%
1022
 
3.5%
926
 
4.1%
828
 
4.4%
737
5.8%
643
6.8%
548
7.5%
458
9.1%
362
9.7%
287
13.7%

Glucose
Real number (ℝ≥0)

MISSING

Distinct124
Distinct (%)19.8%
Missing10
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean116.4808176
Minimum44
Maximum191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:19.519300image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile79
Q197
median112
Q3133
95-th percentile168
Maximum191
Range147
Interquartile range (IQR)36

Descriptive statistics

Standard deviation27.06003941
Coefficient of variation (CV)0.2323132681
Kurtosis-0.1212147722
Mean116.4808176
Median Absolute Deviation (MAD)17
Skewness0.5017046811
Sum73033.47266
Variance732.2457329
MonotonicityNot monotonic
2022-09-20T00:51:19.710905image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9916
 
2.5%
12514
 
2.2%
10614
 
2.2%
10014
 
2.2%
11213
 
2.0%
9513
 
2.0%
10213
 
2.0%
11112
 
1.9%
10812
 
1.9%
10711
 
1.7%
Other values (114)495
77.7%
ValueCountFrequency (%)
441
 
0.2%
561
 
0.2%
572
0.3%
611
 
0.2%
621
 
0.2%
651
 
0.2%
671
 
0.2%
681
 
0.2%
714
0.6%
721
 
0.2%
ValueCountFrequency (%)
1911
 
0.2%
1901
 
0.2%
1881
 
0.2%
1842
 
0.3%
1833
0.5%
1821
 
0.2%
1811
 
0.2%
1804
0.6%
1795
0.8%
1781
 
0.2%

BloodPressure
Real number (ℝ≥0)

Distinct36
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.66356579
Minimum40
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:19.894282image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile54
Q164
median70
Q378
95-th percentile90
Maximum98
Range58
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.73919965
Coefficient of variation (CV)0.1498557814
Kurtosis-0.2040459556
Mean71.66356579
Median Absolute Deviation (MAD)8
Skewness-0.02198672912
Sum45649.69141
Variance115.330409
MonotonicityNot monotonic
2022-09-20T00:51:20.076118image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
7049
 
7.7%
7446
 
7.2%
6439
 
6.1%
6839
 
6.1%
7237
 
5.8%
7837
 
5.8%
69.1054687535
 
5.5%
8035
 
5.5%
7631
 
4.9%
6030
 
4.7%
Other values (26)259
40.7%
ValueCountFrequency (%)
401
 
0.2%
444
 
0.6%
461
 
0.2%
484
 
0.6%
5011
1.7%
5210
1.6%
5411
1.7%
552
 
0.3%
5612
1.9%
5814
2.2%
ValueCountFrequency (%)
983
 
0.5%
964
 
0.6%
951
 
0.2%
946
 
0.9%
928
 
1.3%
9016
2.5%
8822
3.5%
8618
2.8%
856
 
0.9%
8415
2.4%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct40
Distinct (%)6.4%
Missing12
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean25.01615
Minimum8
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:20.371579image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile15
Q120.53645833
median20.53645833
Q330
95-th percentile41
Maximum47
Range39
Interquartile range (IQR)9.463541667

Descriptive statistics

Standard deviation7.939049498
Coefficient of variation (CV)0.3173569673
Kurtosis-0.09060124681
Mean25.01615
Median Absolute Deviation (MAD)3.463541667
Skewness0.7787363824
Sum15635.09375
Variance63.02850692
MonotonicityNot monotonic
2022-09-20T00:51:20.553185image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
20.53645833222
34.9%
3225
 
3.9%
3024
 
3.8%
2319
 
3.0%
1819
 
3.0%
2718
 
2.8%
2818
 
2.8%
1917
 
2.7%
3115
 
2.4%
1514
 
2.2%
Other values (30)234
36.7%
ValueCountFrequency (%)
81
 
0.2%
104
 
0.6%
116
 
0.9%
126
 
0.9%
138
1.3%
144
 
0.6%
1514
2.2%
165
 
0.8%
1711
1.7%
1819
3.0%
ValueCountFrequency (%)
472
 
0.3%
466
0.9%
452
 
0.3%
443
 
0.5%
434
 
0.6%
425
 
0.8%
4112
1.9%
4014
2.2%
3911
1.7%
385
 
0.8%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION

Distinct103
Distinct (%)16.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean89.22733108
Minimum22
Maximum180
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:20.734161image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile50
Q179.79947917
median79.79947917
Q388
95-th percentile159.2
Maximum180
Range158
Interquartile range (IQR)8.200520833

Descriptive statistics

Standard deviation29.26401127
Coefficient of variation (CV)0.3279713841
Kurtosis1.825112749
Mean89.22733108
Median Absolute Deviation (MAD)0
Skewness1.347334711
Sum56837.8099
Variance856.3823557
MonotonicityNot monotonic
2022-09-20T00:51:20.928941image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
79.79947917365
57.3%
10511
 
1.7%
1408
 
1.3%
1308
 
1.3%
1807
 
1.1%
947
 
1.1%
1007
 
1.1%
1207
 
1.1%
1156
 
0.9%
1106
 
0.9%
Other values (93)205
32.2%
ValueCountFrequency (%)
221
 
0.2%
232
0.3%
291
 
0.2%
321
 
0.2%
363
0.5%
372
0.3%
381
 
0.2%
402
0.3%
411
 
0.2%
421
 
0.2%
ValueCountFrequency (%)
1807
1.1%
1781
 
0.2%
1763
0.5%
1753
0.5%
1711
 
0.2%
1702
 
0.3%
1684
0.6%
1672
 
0.3%
1661
 
0.2%
1654
0.6%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct220
Distinct (%)35.0%
Missing8
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean31.4807923
Minimum18.2
Maximum46.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:21.124132image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile21.9
Q126.9
median31.6
Q335.4
95-th percentile42.86
Maximum46.8
Range28.6
Interquartile range (IQR)8.5

Descriptive statistics

Standard deviation6.122196083
Coefficient of variation (CV)0.1944740153
Kurtosis-0.3737853438
Mean31.4807923
Median Absolute Deviation (MAD)4.2
Skewness0.2507719123
Sum19801.41836
Variance37.48128488
MonotonicityNot monotonic
2022-09-20T00:51:21.331558image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31.611
 
1.7%
31.9925781211
 
1.7%
31.210
 
1.6%
33.310
 
1.6%
3210
 
1.6%
32.49
 
1.4%
30.88
 
1.3%
34.28
 
1.3%
29.78
 
1.3%
307
 
1.1%
Other values (210)537
84.3%
(Missing)8
 
1.3%
ValueCountFrequency (%)
18.23
0.5%
18.41
 
0.2%
19.11
 
0.2%
19.31
 
0.2%
19.41
 
0.2%
19.52
0.3%
19.62
0.3%
19.91
 
0.2%
201
 
0.2%
20.42
0.3%
ValueCountFrequency (%)
46.82
0.3%
46.71
 
0.2%
46.31
 
0.2%
46.21
 
0.2%
46.12
0.3%
45.71
 
0.2%
45.61
 
0.2%
45.51
 
0.2%
45.33
0.5%
45.21
 
0.2%

DiabetesPedigreeFunction
Real number (ℝ≥0)

MISSING

Distinct435
Distinct (%)69.6%
Missing12
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean0.428736
Minimum0.078
Maximum1.353
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:21.517428image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.137
Q10.236
median0.343
Q30.582
95-th percentile0.9538
Maximum1.353
Range1.275
Interquartile range (IQR)0.346

Descriptive statistics

Standard deviation0.264833497
Coefficient of variation (CV)0.6177076267
Kurtosis0.9643082476
Mean0.428736
Median Absolute Deviation (MAD)0.147
Skewness1.164151679
Sum267.96
Variance0.07013678115
MonotonicityNot monotonic
2022-09-20T00:51:21.708953image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2685
 
0.8%
0.2615
 
0.8%
0.2545
 
0.8%
0.2585
 
0.8%
0.2634
 
0.6%
0.2594
 
0.6%
0.2384
 
0.6%
0.2844
 
0.6%
0.2454
 
0.6%
0.3044
 
0.6%
Other values (425)581
91.2%
(Missing)12
 
1.9%
ValueCountFrequency (%)
0.0781
0.2%
0.0841
0.2%
0.0852
0.3%
0.0882
0.3%
0.0891
0.2%
0.0921
0.2%
0.0961
0.2%
0.11
0.2%
0.1011
0.2%
0.1021
0.2%
ValueCountFrequency (%)
1.3531
0.2%
1.3211
0.2%
1.3181
0.2%
1.2921
0.2%
1.2821
0.2%
1.2681
0.2%
1.2511
0.2%
1.2241
0.2%
1.2221
0.2%
1.2131
0.2%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct44
Distinct (%)7.1%
Missing15
Missing (%)2.4%
Infinite0
Infinite (%)0.0%
Mean32.23311897
Minimum21
Maximum64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.1 KiB
2022-09-20T00:51:21.898547image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median28.5
Q339
95-th percentile53.95
Maximum64
Range43
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.50674178
Coefficient of variation (CV)0.3259610647
Kurtosis0.1708331769
Mean32.23311897
Median Absolute Deviation (MAD)6.5
Skewness0.9962409938
Sum20049
Variance110.3916228
MonotonicityNot monotonic
2022-09-20T00:51:22.091795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
2262
 
9.7%
2154
 
8.5%
2541
 
6.4%
2440
 
6.3%
2831
 
4.9%
2329
 
4.6%
2627
 
4.2%
2727
 
4.2%
2921
 
3.3%
4119
 
3.0%
Other values (34)271
42.5%
ValueCountFrequency (%)
2154
8.5%
2262
9.7%
2329
4.6%
2440
6.3%
2541
6.4%
2627
4.2%
2727
4.2%
2831
4.9%
2921
 
3.3%
3017
 
2.7%
ValueCountFrequency (%)
641
 
0.2%
634
0.6%
624
0.6%
611
 
0.2%
603
0.5%
592
0.3%
583
0.5%
574
0.6%
562
0.3%
553
0.5%

Outcome
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)0.2%
Missing203
Missing (%)31.9%
Memory size5.1 KiB
0.0
434 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1302
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0434
68.1%
(Missing)203
31.9%

Length

2022-09-20T00:51:22.281617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T00:51:22.483894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0434
100.0%

Most occurring characters

ValueCountFrequency (%)
0868
66.7%
.434
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number868
66.7%
Other Punctuation434
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0868
100.0%
Other Punctuation
ValueCountFrequency (%)
.434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0868
66.7%
.434
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0868
66.7%
.434
33.3%

Interactions

2022-09-20T00:51:16.835493image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:06.561078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:08.119796image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.518318image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.897696image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:12.388891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.800912image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.206538image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.005435image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:06.790549image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:08.292684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.685683image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.064786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:12.565386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.969486image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.411320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.181359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:06.970987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:08.470731image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.853186image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.234387image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:12.740892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:14.146074image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.609463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.356231image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:07.151717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:08.648235image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.023554image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.405489image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:12.912100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:14.319219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.789196image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.533976image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:07.325761image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:08.824739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.203956image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.583766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.086682image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:14.493838image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.974834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.710502image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:07.498025image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.007328image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.388445image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.750825image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.257927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:14.673360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:16.164388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:17.880987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:07.672455image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.183942image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.554828image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:11.916349image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.439944image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:14.841907image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:16.349448image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:18.059714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:07.946407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:09.354636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:10.727777image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:12.198400image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:13.628922image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:15.013456image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T00:51:16.659074image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-20T00:51:22.658844image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-20T00:51:22.871481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-20T00:51:23.077155image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-20T00:51:23.389995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-20T00:51:18.285147image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-20T00:51:18.521129image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-20T00:51:18.759824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-20T00:51:18.950846image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06.0148.072.00000035.00000079.79947933.6000000.62750.0NaN
11.085.066.00000029.00000079.79947926.6000000.35131.00.0
28.0183.064.00000020.53645879.79947923.3000000.67232.0NaN
31.089.066.00000023.00000094.00000028.1000000.16721.00.0
40.0137.040.00000035.000000168.00000043.100000NaN33.0NaN
55.0116.074.00000020.53645879.79947925.6000000.20130.00.0
63.078.050.00000032.00000088.00000031.0000000.24826.0NaN
710.0115.069.10546920.53645879.79947935.3000000.13429.00.0
88.0125.096.00000020.53645879.79947931.9925780.23254.0NaN
94.0110.092.00000020.53645879.79947937.6000000.19130.00.0

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
6270.0123.072.020.53645879.79947936.30.25852.0NaN
6281.0106.076.020.53645879.79947937.50.19726.00.0
6296.0190.092.020.53645879.79947935.50.278NaNNaN
6309.0170.074.031.00000079.79947944.00.40343.0NaN
6319.089.062.020.53645879.79947922.50.14233.00.0
63210.0101.076.0NaN180.00000032.90.17163.00.0
6332.0122.070.027.00000079.79947936.80.34027.00.0
6345.0121.072.023.000000112.00000026.20.24530.00.0
6351.0126.060.020.53645879.79947930.10.34947.0NaN
6361.093.070.031.00000079.79947930.40.31523.00.0